\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\MVN}{MVN} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Data analysis example: Hurricane lizards

Peter Ralph

Advanced Biological Statistics

Hurricane lizards

Data from: Donihue, C.M., Herrel, A., Fabre, AC. et al. Hurricane-induced selection on the morphology of an island lizard. Nature 560, 88–91 (2018).

Anolis scriptus morphology and performance from before and after hurricanes

Lizard morphology data collected from two visits to Pine Cay and Water Cay in
the Turks and Caicos Islands. Anolis scriptus lizards were surveyed on both
islands four days before hurricane Irma and again, six weeks later, after the
islands had been hit by Hurricanes Irma and Maria. We measured morphology and
performance and found significant differences in the "before" and "after"
hurricane lizard population morphology. All linear measurements are in MM,
surface area measurements are in MM2 and force in N. Counts correspond to the
number of lamellar scales on the forelimb and hind limb toepads.

Figure S1

Here’s the data: dataset; README. First we’ll read it in, reorder the levels of the Hurricane factor (so that “before” comes before “after”), change units on SVL to centimeters so it’s on the same scale as the other variables (useful below), and drop unused variables.

lizards <- read.csv(
            "../Datasets/Hurricane_lizards/lizards.csv",
            header=TRUE, stringsAsFactors=TRUE)
lizards$Hurricane <- factor(lizards$Hurricane, levels=c("Before", "After"))
lizards$SVL_cm <- lizards$SVL / 10
varnames <- c("SVL_cm", "Femur", "Tibia", "Metatarsal", "LongestToe", "Humerus", 
    "Radius", "Metacarpal", "LongestFinger", "FingerCount", "ToeCount", 
    "MeanFingerArea", "MeanToeArea")
lizards <- lizards[, c(c("Hurricane", "Origin", "Sex"), varnames)]
head(lizards)
##   Hurricane   Origin    Sex SVL_cm Femur Tibia Metatarsal LongestToe Humerus Radius Metacarpal LongestFinger FingerCount ToeCount MeanFingerArea MeanToeArea
## 1     After Pine Cay   Male  4.869 10.39 11.87       7.52       7.43    8.66   7.99       2.22          3.19          10       12         1.3327       2.433
## 2     After Pine Cay Female  4.031  8.66  9.79       6.18       6.20    8.01   6.51       2.38          3.55          10       13         0.9613       1.518
## 3     After Pine Cay   Male  5.830 12.87 14.76       9.45       9.58   11.72   9.54       3.54          5.09          14       15         2.6313       4.098
## 4     After Pine Cay Female  4.315  8.55 10.29       6.60       6.26    7.43   6.60       2.79          3.55          11       12         1.1777       1.879
## 5     After Pine Cay Female  4.551 10.26 11.02       6.89       7.02    7.71   7.25       2.52          3.37          11       13         1.3843       2.530
## 6     After Pine Cay Female  4.697 10.02 10.78       6.85       7.18    8.45   7.15       2.39          3.26          12       14         1.4260       2.036

Here’s the sample sizes:

with(lizards, table(Sex, Origin, Hurricane))
## , , Hurricane = Before
## 
##         Origin
## Sex      Pine Cay Water Cay
##   Female       15        18
##   Male         18        20
## 
## , , Hurricane = After
## 
##         Origin
## Sex      Pine Cay Water Cay
##   Female       17        22
##   Male         29        25

Brainstorming

  • What morphological differences do we expect betweeen the “before” and “after” groups of lizards?
  • What visualizations can we use to look for these differences? (Sketch them.)
  • What statistical analysis can we use?
##   Hurricane   Origin    Sex SVL_cm Femur Tibia Metatarsal LongestToe Humerus Radius Metacarpal LongestFinger FingerCount ToeCount MeanFingerArea MeanToeArea
## 1     After Pine Cay   Male  4.869 10.39 11.87       7.52       7.43    8.66   7.99       2.22          3.19          10       12         1.3327       2.433
## 2     After Pine Cay Female  4.031  8.66  9.79       6.18       6.20    8.01   6.51       2.38          3.55          10       13         0.9613       1.518
## 3     After Pine Cay   Male  5.830 12.87 14.76       9.45       9.58   11.72   9.54       3.54          5.09          14       15         2.6313       4.098
## 4     After Pine Cay Female  4.315  8.55 10.29       6.60       6.26    7.43   6.60       2.79          3.55          11       12         1.1777       1.879
## 5     After Pine Cay Female  4.551 10.26 11.02       6.89       7.02    7.71   7.25       2.52          3.37          11       13         1.3843       2.530
## 6     After Pine Cay Female  4.697 10.02 10.78       6.85       7.18    8.45   7.15       2.39          3.26          12       14         1.4260       2.036

Steps in data analysis

  1. Care, or at least think, about the data.

  2. Look at the data.

  3. Query the data.

  4. Check the results.

  5. Communicate.

Ideas

IN CLASS

Goals

  • describe how traits differ before/after hurricanes
  • account for confounding factor: overall size
  • also possibly account for differences between islands and sexes

Exercise: Write a possible conclusion sentence or two (imagining what might happen). Make sure to communicate (a) the size of the effect we’re looking for, in real-world terms; (b) the strength of the statistical evidence; and (c) context (e.g., how’s the size of the hurricane-induced change compare to other differences).

Examples

We found that the typical ratio of hand size to body size was 10% larger after than before (t-test, p-value 0.01). This is roughly 2 standard deviations of the ratios observed within each sex; ratios do not seem to differ between islands.

The body radius decreased by 20% after the hurricane compared to before.

Metatarsal length increased by 5% after the hurricane (possibly due to incrased grip).

We see that lizards with the longest fingers and toes (5% longer on average) survived better on average (t-test of p=0.01); they seem to have been able to hold on better during the hurricane.

Look at the data

Histograms

lizards_long <- pivot_longer(lizards, col=all_of(varnames), names_to="measurement")
ggplot(lizards_long) + geom_histogram(aes(x=value)) + facet_wrap(~measurement)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot of chunk r hists

There is not a huge, obvious shift.

ggplot(lizards_long) + geom_boxplot(aes(x=measurement, y=value, col=Hurricane)) +  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) 

plot of chunk r next

ggplot(lizards_long) + geom_boxplot(aes(x=measurement, y=value, fill=Hurricane)) +  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + facet_wrap(~Sex) 

plot of chunk r next2

Let’s do some t-tests to see what happens:

results <- expand.grid(
    variable=varnames,
    sex=levels(lizards$Sex),
    stringsAsFactors=FALSE
)
results$p <- NA
results$lower_CI <- NA
results$upper_CI <- NA

for (j in 1:nrow(results)) {
    v <- results$variable[j]
    s <- results$sex[j]
    x <- subset(lizards, Sex==s)
     t <- t.test(x[[v]][x$Hurricane == "Before"],
            x[[v]][x$Hurricane == "After"]
     )
     results$p[j] <- t$p.value
     results$lower_CI[j] <- t$conf.int[1]
     results$upper_CI[j] <- t$conf.int[2]
}

This is preliminary, but it seems that maybe males need shorter (?) legs, and females need larger (?) fingers.

results[order(results$p),]
##          variable    sex         p lower_CI upper_CI
## 15          Femur   Male 3.030e-07  0.70632  1.49627
## 18     LongestToe   Male 5.133e-06  0.41631  0.98772
## 12 MeanFingerArea Female 1.204e-05 -0.25470 -0.10373
## 16          Tibia   Male 6.230e-05  0.36557  1.02030
## 13    MeanToeArea Female 4.725e-04 -0.36000 -0.10662
## 10    FingerCount Female 1.172e-03 -1.04091 -0.26911
## 17     Metatarsal   Male 1.410e-03  0.14257  0.57219
## 14         SVL_cm   Male 1.917e-03  0.09161  0.39137
## 2           Femur Female 1.054e-02  0.10359  0.75492
## 20         Radius   Male 1.402e-02  0.06389  0.55246
## 6         Humerus Female 3.436e-02 -0.47849 -0.01885
## 22  LongestFinger   Male 1.227e-01 -0.03672  0.30369
## 8      Metacarpal Female 1.354e-01 -0.21897  0.03025
## 21     Metacarpal   Male 2.715e-01 -0.05845  0.20522
## 1          SVL_cm Female 4.324e-01 -0.13187  0.05706
## 11       ToeCount Female 4.970e-01 -0.65824  0.32257
## 5      LongestToe Female 4.980e-01 -0.13888  0.28289
## 19        Humerus   Male 5.556e-01 -0.25189  0.46536
## 25 MeanFingerArea   Male 6.349e-01 -0.18544  0.11373
## 23    FingerCount   Male 6.674e-01 -0.44854  0.28866
## 3           Tibia Female 6.923e-01 -0.33560  0.22413
## 4      Metatarsal Female 7.478e-01 -0.22756  0.16415
## 26    MeanToeArea   Male 8.006e-01 -0.21041  0.27190
## 24       ToeCount   Male 8.869e-01 -0.35325  0.40787
## 9   LongestFinger Female 9.214e-01 -0.15176  0.13740
## 7          Radius Female 9.294e-01 -0.17156  0.18756

Everything is correlated with everything else:

plot of chunk r pairs

sex and size are almost totally confounded

plot of chunk r pairs_sex

plot of chunk r pairs_origin

Let’s look at the Femur ~ SVL plot:

simple_lm <- lm(Femur ~ SVL_cm + Hurricane, data=lizards)
plot(Femur ~ SVL_cm, data=lizards, col=ifelse(Hurricane=="Before", "black", "red"), pch=20)
coefs <- coef(simple_lm)
abline(coefs["(Intercept)"], coefs["SVL_cm"])
abline(coefs["(Intercept)"] + coefs["HurricaneAfter"], coefs["SVL_cm"], col='red', pch=20)
legend("bottomright", lty=1, col=c("black", "red"), legend=c("Before", "After"))

plot of chunk r plitit